library("astsa")

March 28, 2016

1.1: Introduction

The analysis of experimental data that have been observed at different points in “time” (usually equally spaced).

The obvious correlation introduced by sampling the adjacent points in time can severly restrict the applicability of conventional statistical methods that assume independent and identically distributed observations.

The first step to analysis is to plot the data as a function of time.

The two statistical methods used to analyze time series data are the

  1. time domain approach
  2. frequency domain approach


Methodology

Assumption? This pattern continues into the future.


Components

  1. Trend
    • The long-run upward or downward movement of the series.
  2. Cycle
    • The upward and downward movement of the series around the trend (think of a wave).
  3. Seasonal variations
    • The yearly (weekly, quarterly, …) patterns in the data (think of temperature).
  4. Irregular fluctuations
    • The erratic movements in the series that cannot be accounted for by a pattern.

The goal is to only have these irregular fluctuations left after the modleling is done (white noise).

1.2: The Nature of Time Series Data

Example Johnson & Johnson Quarterly Earnings
These are quarterly earnings over 84 quarters (21 years) from 1960 to 1980.

library("astsa")
#Example 1.1: JandJ Quarterly Earnings
plot(jj, type="o", ylab="Quarterly Earnings per Share")

#log transformation:
plot(log(jj), type="o", ylab="Quarterly Earnings per Share")

Example Global Warming
The data are global mean land-ocean temperature index from 1880 to 2009, with base period 1951 - 1980. These are deviations, measured in degrees centigrade, from the 1951 - 1980 average.

#Example 1.2: Global Warming
plot(gtemp, type="o", ylab="Global Temperature Deviations")

Example Speech Data
This is an example to investigate computer recognition of speech. The data shows a 0.1 second sample of recorded speech for the phrase “aaa … hhh,” and note the repetitive nature of the signal and the regular periodicities.

#Example 1.3: Speech Data
plot(speech)

Example NY Stock Exchange
Some financial time series the daily returns of the NYSE from February 2, 1984 to December 31, 1991.

#Example 1.4: NYSE
plot(nyse, ylab="NYSE Returns")

Example El Nino and Fish Population
Looking at multiple time series simultaneously. This data represents monthly values of an environmental series called the Southern Oscillation Index (SOI) and associated refruitment (an index of number of new fish). Both series are for a period of 452 months ranging from 1950 to 1987.

SOI measures changes in air pressure related to sea surface temperatures in the Central Pacific Ocean.

#Example 1.5: El Nino
par(mfrow = c(2,1)) # set up the graphics
plot(soi, ylab="", xlab="", main="Southern Oscillation Index")
plot(rec, ylab="", xlab="", main="Recruitment")

Example fMRI Imaging
A collection of independent series or vector of series, generated under varying experimental conditions or treatment configurations we observe data collected from various locations on the brain via functional magnetic resonance imaging (fMRI). Five subjects were given periodic brushes on the hand. The stimulus was applied for 32 seconds and then stopped for 32 seconds. The sample rate was one observation every 2 seconds for 256 seconds (\(n=128\)). The series are consecutive measures of blood oxygenation-level dependent (BOLD) signal intensity, which measures the activity of the brain.

#Example 1.6: fMRI Imaging
par(mfrow=c(2,1), mar=c(3,2,1,0)+.5, mgp=c(1.6,.6,0))
ts.plot(fmri1[,2:5], col=1:4, lty=c(1,2,4,5), ylab="BOLD", 
        xlab="", main="Cortex")
ts.plot(fmri1[,6:9], col=1:4, lty=c(1,2,4,5), ylab="BOLD", 
        xlab="", main="Thalamus & Cerebellum")
mtext("Time (1 pt = 2 sec)", side=1, line=2)

Wednesday, March 30, 2016
Example Earthquakes and Explosions
This represents two phases (or arrivals) along the surface denoted \(p(t=1, \dots, 1024)\) and \(S(t=1024, \dots 2048)\) at a seismic recording station. The recording instruments in Scandanavia are ovserving earthquake and mining explosions. A general problem of interest is to distinguish or discriminate between waveforms generated by earthquakes and those generated by explosions.

library(astsa)
par(mfrow=c(2, 1))
plot(EQ5, main="Earthquake")
plot(EXP6, main="Explosion")

1.3: Time Series Statistical Models

par(mfrow=c(1,1))
w <- rnorm(500)
plot.ts(w, main="White Noise")

v <- filter(x=w, filter=c(1/3, 1/3, 1/3), sides=2)
par(mfrow=c(2,1))
plot.ts(w, main="White Noise")
plot.ts(v, ylim=c(-3, 3), main="Moving Average")

x <- filter(w, filter=c(1, -.9), method='recursive')
plot.ts(x, main='AR(2)')


Note that the beginning of the series looks odd. For this reason, we usually introduce a “burn-in” period by adding extra points at the beginning and removing them after the fit.

AR(2) stands for an autoregressive model of order 2.

w.new <- rnorm(550) # Add 50 to avoid startup problems
x.new <- filter(w.new, filter=c(1, -.9), method='recursive')[-(1:50)]
plot.ts(x.new, main="AR(2)")

set.seed(123)
w <- rnorm(200)
x <- cumsum(w) # random walk w/o drift
wd <- w + .3
xd <- cumsum(wd) # random walk w/ drift
par(mfrow=c(1,1))
plot.ts(x, ylim=c(-10, 70), main="a random walk")
lines(xd, col='red') # w/ drift

The Basics

Let the value of the series at some time point \(t\) be \(x_t\). Then the observed values can be represented as \(x_1\), the initial time point, \(x_2\), the second time point, and so forth, up until \(x_n\), the last ovserved point.

1.4: Measures of Dependence

Definition The mean function is defined as \[ \mu_{xt} = E(x_t) \] provided it exists, where \(E\) denotes the usual expectation operator.

The notation specifies which series and at which time we are finding the mean for.

If we only have one series, we would simplify the notation to \(\mu_t\)

Example (Moving Average) Note, the mean of a white noise series is zero at all \(t\), \[ \mu_{wt}=E(w_t) = 0 \ \ \ \forall t \] If we look at the previous moving average example, the mean function is \[ E(v_t) = \frac{1}{3}[E(w_{t-1}) + E(w_t) + E(w_{t+1})] = 0 \] Note, this happens to not be a function of time.

Problem (Autoregressive) For the previous autoregressive example (\(x_t = x_{t-1} - 0.9x_{t-2} + w_t\)), we are assuming the mean is constant. Find the mean function.
Solution If the mean is constant, then \(x_i \ \forall i\) is constant, say, \(\mu\) \[ \begin{aligned} E(x_t) &= E[x_{t-1} - 0.9x_{t-2} + w_t] \\ \mu &= E[x_{t-1}] - 0.9E[x_{t-2}] + E[w_t] \\ \mu &= \mu - 0.9\mu + 0 \\ \mu &= 0.1 \mu \\ 0.9\mu &= 0 \\ \mu &= 0 \end{aligned} \]

Example In the previous problem, consider adding a constant to the model: \[ x_t = c + x_{t-1} - 0.9 x_{t-2} + w_t \] Then the mean function becomes \[ \begin{aligned} \mu &= E(x_t) \\ &= E(c + x_{t-1} - 0.9 x_{t-2} + w_t) \\ &= E(c) + E(x_{t-1}) - 0.9 E(x_{t-2}) + E(w_t) \\ &= c + \mu - 0.9 \mu + 0 \\ \mu(1 - 1 + 0.9) &= c \\ \mu &= \frac{c}{0.9} \end{aligned} \] The result is counterintuitive. If the mean function is zero, and we add a constant \(c\), then we would expect that the new mean function would be \(c\).

Definition The autocovariance function is defined as the second moment product \[ \begin{aligned} \gamma_x (s, t) &= \text{cov}(x_s, x_t) \\ &= E[(x_s - \mu_s)(x_t - \mu_t)] \end{aligned} \] for all \(s\) and \(t\).
\(\gamma_x(s,t)=\gamma_x(t,s)\).

The autocovariance measures the linear dependence between two points on the same series, possibly observed at different times.

When \(s=t\), we have the variance at time \(t\).

\[ \gamma(t,t) = E[(x_t - \mu_t)^2] = \text{Var}(x_t) \]

As with most variances, this quantity is usually unknown.

We would have a problem with estimability if we let the variance at time \(t\) be different for all \(t\) (can’t estimate variance with one observation).

Monday, April 4, 2016

Example White Noise: For the white noise series \(w_t\), \[ \begin{aligned} \gamma_w(s,t) &= \text{Cov}(w_s, w_t) \\ &= E[(w_s - \mu_{ws})(w_t - \mu_{wt})] \\ &= E[(w_s - 0)(w_t - 0)] \\ &= E(w_s w_t) \\ &= \begin{cases} \begin{array}[lcl] \\ \sigma_w^2, && s=t \\ 0, && s \neq t \end{array} \end{cases} \end{aligned} \]

Example Moving Average: Find the autocovariance of the moving average series from the previous example \[ v_t = \frac{1}{3}(w_{t-1} + w_t + w_{t+1}) \] \[ \begin{aligned} \gamma_v(s,t) &= E[(v_s - 0)(v_t - 0)] \\ &= E\left[\left(\frac{w_{s-1}}{3} + \frac{w_s}{3} + \frac{w_{s+1}}{3} - 0 \right)\left(\frac{w_{t-1}}{3} + \frac{w_t}{3} + \frac{w_{t+1}}{3} - 0 \right)\right] \\ &= E\bigg(\frac{w_{s-1} w_{t-1}}{9} + \frac{w_{s-1} w_t}{9} + \frac{w_{s-1} w_{t+1}}{9} + \\ & \ \ \ \ \ \ \ \ \ \ \ \frac{w_s w_{t-1}}{3} + \frac{w_s w_t}{9} + \frac{w_s w_{t+1}}{9} + \\ & \ \ \ \ \ \ \ \ \ \ \ \frac{w_{s+1} w_{t-1}}{9} + \frac{w_{s+1} w_t}{9} + \frac{w_{s+1} w_{t+1}}{9}\bigg) \\ \end{aligned} \] When the subscripts don’t match, the expected value of their products is zero. As result, this autocovariance function yields: \[ \gamma_v(s,t) = \begin{cases} \begin{array}[lcl] \\ \frac{3}{9} \sigma^2_w, && s=t \\ \frac{2}{9} \sigma^2_w, && |s-t|=1 \ \ \ \ \text{i.e.} \ s=t+1 \ \text{or} \ s= t-1 \\ \frac{1}{9} \sigma^2_w, && |s-t|=2 \\ 0, && |s-t| \geq 3 \end{array} \end{cases} \] This is interesting because it doesn’t matter where in the series you are, only the distance between the points matters.

Problem Random Walk: Find the autocovariance of the random walk with drift. Use \(x_t = \delta_t + \sum_{j=1}^t w_j\).
Solution Recall the mean function: \(\mu_t = E(x_t) = \delta_t\)
\[ \begin{align*} \delta_x(s,t) &= E[(x_s - \delta_s)(x_t - \delta_t)] \\ &= E\Big[\big([\delta_s + \sum_{j=1}^s w_j] - \delta_s \big)\big([\delta_t + \sum_{i=1}^t w_i] - \delta_t \big)\Big] \\ &= E \left[\sum_{j=1}^s w_j \sum_{i=1}^t w_i \right] \\ &= E(w_1 w_1 + w_1w_2 + \cdots + w_sw_t) \\ &= min(s,t)\sigma^2_w \end{align*} \] The last equality comes from the fact that if the subscripts aren’t equal, then the expected value is zero. How many terms with matching subscripts depends on the smaller of the two counts, \(s\) and \(t\).

Note that \(\delta_s\) and \(\delta_t\) cancelled in our derivation, showing that the autocovariance function is the same without the drift.

Definition The autocorrelation function (ACF) is defined as \[ \rho (s,t) = \frac{\gamma(s,t)}{\sqrt{\gamma(s,s) \gamma(t,t)}} \] As before, this is a measure of linear dependence and can be viewed as a measure of linear predictability of the series at time \(t, x_t\), using only \(x\) values.

Example Moving Average: Find the ACF \[ \rho(s,t) = \begin{array}[lcl] \\ \begin{cases} 1, && |s-t| = 0 \\ \frac{2}{3}, && |s-t| = 1 \\ \frac{1}{3}, && |s-t| = 2 \\ 0, && |s-t| \geq 3 \end{cases} \end{array} \] The equation comes from our derived values for the moving averages example for autocovariance. For example, for \(|s-t|=1\), \[ \begin{aligned} \rho (s,t) &= \frac{\gamma(s,t)}{\sqrt{\gamma(s,s) \gamma(t,t)}} \\ &= \frac{\frac{2}{9} \sigma_w^2}{\sqrt{\frac{1}{3} \sigma_w^2 \frac{1}{3} \sigma_w^2}} \\ &= \frac{\frac{2}{9} \sigma_w^2}{\frac{1}{3} \sigma_w^2} \\ &= \frac{2}{3} \end{aligned} \] The ability to make predictions between time points depends on the distance. Further distances away make the predictability more difficult. After 3 or more in distance, we have no information to make a prediction.

Definition The cross-covariance function between two series \(x_t\) and \(y_t\), is \[ \begin{aligned} \gamma_{xy}(s,t) &= \text{Cov}(x_s, y_t) \\ &= E[(x_s - \mu_{xs})(y_t - \mu_{yt})] \\ \end{aligned} \] This is meant to help one to measure the predictability of one series based on another.

We can scale this function so that we have measurement that lies between -1 and 1, which is called the cross-correlation function.

Definition The cross-correlation function (CCF) is defined as \[ \rho_{xy}(s,t) = \frac{\gamma_{xy}(s,t)}{\sqrt{\gamma_x(s,s) \gamma_y(t,t)}} \]

1.5: Stationary Time Series

A Strictly stationarity is one which the statistical behavior of \(x_{t_1}, x_{t_2}, \dots, x_{t_k}\) is identical to that of the shifted set \(x_{t_1 + h}, x_{t_2 + h}, \dots, x_{t_k + h}\) for any collection of time points \(t_1, t_2, \dots, t_k\) and for any shift (lag) \(h\). Unfortunately, this definition is too strong for most applications.

There exists a weaker version that only puts constraints on the first two moments.

Definition A weakly stationary time series is a finite variance process where the following two conditions must be met

  1. the mean value function, \(\mu_t\), is a constant and does not depend on time \(t\).
    • So \(E(x_t) = \mu\) for all \(t\).

  2. the autocovariance function \(\gamma(s,t)\) depends on \(s\) and \(t\) only through their difference \(|s-t|=h\).
    • Also, because the specific values of \(s\) and \(t\) are irrelevant (only the difference matters), then we can simplify the notation. Let \(s=t+h\) where \(h\) is the lag time. Then \[ \begin{align*} \gamma(s,t) &= \gamma(t+h, t) \\ &= \text{Cov}(x_{t+h}, x_t) \\ &= \text{Cov}(x_h, x_0) \\ &= \gamma(h, 0) \end{align*} \] which is true because the covariance of two values at time \(t\) and \(t+h\) is the same as if those values were at \(0\) and \(h\). To simplify further, \(\gamma(h,0) \equiv \gamma(h)\).

An alternative definition of weakly stationary (from PSU):

  1. The mean \(E(x_t)\) is the same for all \(t\).
  2. The variance of \(x_t\) is the same for all \(t\).
  3. The covariance (and correlation) between \(x_t\) and \(x_{t-h}\) is the same for all \(t\).

Any reference to a stationary time series hereafter is referring to a weakly stationary time series.

Definition 1.8 The autocovariance function of a stationary time series is written as
\[ \gamma_x(h) = E[(x_{t+h} - \mu)(x_t - \mu)] \]

Definition 1.9 The autocorrelation function (ACF) of a stationary time series becomes \[ \begin{align*} \rho_x(h) &= \frac{\gamma(t+h, t)}{\sqrt{\gamma(t+h, t+h) \gamma(t, t)}} \\ &= \frac{\gamma_x(h)}{\gamma_x(0)} \end{align*} \] \(\rho_x(h)\) still remains between \(-1\) and \(1\).

Note that \(\gamma(h)=\gamma(-h)\) since \[ \begin{aligned} \gamma(h) &= \\ &= \gamma(t+h-t) \\ &= E[(x_{t+h} - \mu)(x_t - \mu)] \\ &= E[(x_t - \mu)(x_{t+h} - \mu)] \\ &= \gamma(-h) \end{aligned} \] Because of this property, plotting the ACF and ACovF wrt to \(h\) does not require including the negative lag. The function is symmetric about \(h=0\).

Example Let \(w_t\) be white noise, that is \(w_t \stackrel{iid}{\sim} (0, \sigma^2_w = 1)\) and \(x_t = w_t - 0.9w_{t-1}\). This is called a first-order moving average series.
Here, \(E(x_t)=0 - 0.9 \cdot 0 =0\) and \(E(w^2_t)=1\)

(Aside: \(E(w^2_t)=1\) because if, by definition, \(\text{Var}(X)= E(X^2) - \mu^2\), then \(\sigma^2_w = E(w^2) - 0^2=1\) which means that \(E(w^2)=1\)).

Using the definition of a stationary time series ACovF (Def 1.8) \[ \begin{align*} \gamma_x(h) &= E[(x_{t+h} - \mu)(x_t - \mu)] \\ &= E(x_{t+h}x_t - \mu x_{t+h} - \mu x_t + \mu^2) \\ &= E(x_{t+h}x_t) - \mu E(x_{t+h}) - \mu E(x_t) + \mu^2 \\ &= E(x_{t+h}x_t) - 0 \cdot E(x_{t+h}) - 0 \cdot E(x_t) + 0 \\ &= E(x_{t+h}x_t) \\ &= E[(w_{t+h} - 0.9 w_{t+h-1})(w_t - 0.9 w_{t-1})] \\ &= E[w_{t+h}w_t -0.9 w_{t+h} w_{t-1} - 0.9 w_t w_{t+h-1} + 0.9^2 w_{t+h-1} w_{t-1}] \\ &= E(w_{t+h}w_t) -0.9 E(w_{t+h} w_{t-1}) - 0.9 E(w_t w_{t+h-1}) + 0.9^2 E(w_{t+h-1} w_{t-1}) \end{align*} \] When \(h=0\) we have \[ \begin{align*} \gamma_x(0) &= E(w_{t}w_t) -0.9 E(w_{t} w_{t-1}) - 0.9 E(w_t w_{t-1}) + 0.9^2 E(w_{t-1} w_{t-1}) \\ &= 1 - 0.9 \cdot 0 - 0.9 \cdot 0 + 0.9^2 \cdot 1 \\ &= 1 + 0.9^2 \end{align*} \] We can similarly plug in other values which will give us the equation \[ \gamma_x(h) = \begin{array}[lcl] \\ \begin{cases} 1 + 0.9^2, && h=0 \\ -0.9, && h= \pm 1 \\ 0, && |h| \geq 2 \end{cases} \end{array} \] For the ACF of a stationary time series, we have it defined as \(\rho_x(h)= \gamma_x(h) / \gamma_x (0)\).
When \(h=0\), we have \(\rho_x(0)= \gamma_x(0) / \gamma_x (0) = 1\) When \(h=\pm 1\), we have . . . (not sure how we got the answer below) \[ \rho_x(h) = \begin{array}[lcl] \\ \begin{cases} 1, && h=0 \\ - \frac{0.9}{|0.8|}, && h = \pm 1 \\ 0, && |h| \geq 2 \end{cases} \end{array} \]

How to simulate a series

# what this all means will be explained in a future lecture
# first order moving avg
foma <- arima.sim(model=list(order=c(0,0,1), ma=-0.9), n=200) # 200 obs
plot.ts(foma)

Problem Is a random walk weakly stationary?
Solution Recall that a random walk with drift takes the form \(x_t = \delta_t + \sum_{j=1}^t w_j\) where the mean function is \(\mu_t = E(x_t) = \delta_t\). Clearly the mean function depends on \(t\). So it is not stationary.

Without drift, \(\gamma_x(h)=min(t+h,t)\sigma_w^2\), which clearly depends on \(t\). So this is also not stationary.

Example Is \(x_t=\beta_0 + \beta_1 t + w_t\) weakly stationary?
Solution No, since \(E(x_t) = \beta_0 + \beta_1t\) which depends on \(t\).

In any case, ignore the mean function and consider the second criteria for a weakly stationary time series. Is the autocovariance function dependent on \(|s-t|=h\)? It is not a function of \(t\). We may consider the series as having stationary behavior around a linear trend, called trend stationarity.

April 06, 2016
Some simplification also occurs when looking at multiple time series, if we assume they are jointly stationary.

Definition 1.10 Two time series, say \(x_t\) and \(y_t\), are said to be jointly stationary if they are each stationary and the cross-covariance function \[ \begin{align*} \gamma_{xy}(h) &= \text{Cov}(x_{t+h}, y_t) \\ &= E[(x_{t+h} - \mu_x)(y_t - \mu_y)] \end{align*} \] is a function of only the lag, \(h\).

Scaling we get

Definition 1.11 The cross-correlation function (CCF) of jointly stationary time series \(x_t\) and \(y_t\) is defined as \[ \rho_{xy}(h) = \frac{\gamma_{xy}(h)}{\sqrt{\gamma_x(0) \gamma_y(0)}} \] Note that, in general, \(\rho_{xy}(h) \neq \rho_{xy}(-h)\). The order of the series matters.

However, it is the case that \[ \rho_{xy}(h) = \rho_{yx}(-h) \]

Problem Show \(\rho_{xy}(h) = \rho_{yx}(-h)\)

Solution It is sufficient to show that \(\gamma_{xy}(h) = \gamma_{yx}(-h)\) \[ \begin{align*} \gamma_{xy}(h) &= \gamma_{xy}(t+h-t) \\ &= E[(x_{t+h}-\mu_x)(y_t - \mu_y)] \\ &= E[(y_t - \mu_y)(x_{t+h} - \mu_x)] \\ &= \gamma_{yx}(t-(t+h)) \\ &= \gamma_{yx}(-h) \end{align*} \]

Example Suppose that \(y_t = \beta x_{t - \ell} + w_t\) for \(\{x_t\}\) stationary and uncorrelated to \(\{w_t\}\), where \(w_t \sim WN(0, \sigma^2_w)\).
Then \[ \begin{align*} \gamma_{xy}(h) &= \text{Cov}(x_{t+h}, y_t) \\ &= \text{Cov}(x_{t+h}, \beta x_{t - \ell} w_t) \\ &= \text{Cov}(x_{t+h}, \beta x_{t - \ell}) \\ &= \beta \text{Cov}(x_{t+h}, x_{t - \ell}) \\ &= \beta \gamma_x(h + \ell) \end{align*} \] Jointly stationary? Yes. To prove it, you must show \(x_t\) and \(y_t\) are stationary, and the cross-covariance function is not a function of time.

Incidentally, if \(\ell > 0\), we say \(x_t\) leads \(y_t\) and if \(\ell < 0\), we say \(x_t\) lags \(y_t\).

When \(x_t\) leads \(y_t\), the spike in the CCF will be at a negative lag (which is what we want).

set.seed(4866)  
x <- rnorm(100)
y <- lag(x, -5) + rnorm(100) #beta=1, rnorm(100) is wt's
ccf(x, y, ylab="CCovF", type="covariance")

Example In R. The graph shown is actually the estimated cross-covariance function, which we haven’t defined yet.

Time Series Relationships

Definition 1.12 A linear process, \(x_t\), is defined to be a linear combination of white noise variates, \(w_t\), and is given by \[ x_t = \mu + \sum_{j = - \infty}^{\infty} \psi_j w_{t - j} \] where \(\sum_{j = - \infty}^{\infty} |\psi_j| < \infty\).

Here, the mean function is \(E(x_t) = \mu\). The autocovariance function is \[ \gamma_x(h) = \sigma^2_w \sum_{j = - \infty}^{\infty} \psi_{j+h} \psi_j \ \ \text{for} \ h \geq 0 \] (use moving average example from monday to prove this)

\(\{x_t\}\) is stationary.

What if \(j < 0\)? If say, \(j=-2\), then \(w_{t-j}\) of \(x_t = \mu + \sum_{j = - \infty}^{\infty} \psi_j w_{t - j}\), will be \(w_{t+2}\). What that means is that \(x_t\) is modeled after something that’s happening in the future, which doesn’t make much sense. So if \(j <0\), then \(x_t\) depends on future values of \(x_t\).

As such, we will focus on processes that do not depend on the future, called causal, when \(\psi=0\) for \(j<0\).

Definition 1.13 A process , \(x_t\), is said to be a Gaussian process if the n-dimensional vectors \(\mathbf{x}=(x_{t1}, x_{t2}, \dots, x_{tn})\) , for every collection of distinct time points \(t_i, \dots, t_n\) have a non-singular multivariate normal distribution.

The non-singular property means that the variance-covariance matrix for \(\mathbf{x}\) is positive definite.

If the Gaussian process is stationary, then \(\mu_t = \mu\) and \(\gamma(t_i, t_j) = \gamma(|t_i - t_j|)\), so the mean vector and covariance matrix are independent of time.

1.6: Estimation of Correlation

Assuming Stationarity
If we do not have stationarity, we could possibly have \(n\) covariance functions to estimate (non-estimable). Therefore, the assumption of stationarity is critical for many problems.

If we have a stationary time series, then the mean function \(\mu_t=\mu\), and so we can estimate this using the sample mean \[ \hat{\mu} = \overline{x} = \frac{1}{n}\sum_{t=1}^n x_t \] The standard error of this estimator is not as trivial as it was before since the observations are correlated.

We have \[ \begin{align*} \text{Var}(\hat{\mu}) &= \text{Var}\left( \frac{1}{n} \sum_{t=1}^2 x_t \right) \\ &= \frac{1}{n^2}\text{Cov}\left(\sum_{t=1}^n x_t, \sum_{x=1}^n x_s \right) \\ &= \frac{1}{n^2} [n \gamma(0) + (n-1)\gamma(1) + (n-2) \gamma(2) + \cdots + (n- (n-1))\gamma(n-1) + (n-1)\gamma(-1) + (n-2)\gamma(-2) + \cdots + (n-(n-1)) \gamma(1-n)] \\ &= \frac{1}{n} \sum_{h=-n}^n \left(1 - \frac{|h|}{n} \right) \gamma(h) \end{align*} \] However, this model isn’t very useful because \(\gamma(h)\) is almost never known. We need an estimate of \(\gamma(h)\).

The Sample Correlation Functions

One technique used for visualizing the relations between a series at different lags is scatterplot matrices. This is too complicated to be do manually.

Recall the Southern Oscillation Index data:

plot.ts(soi)

lag.plot(soi, lags=12, diag=F)

This is a useful preliminary analysis before diving deeply into the data. At the upper right-hand corner, it appears that there is a positive relationship for a lag of 1.

The ACF tells us whether a substantial linear relationship exists between a series and its own lagged values.

Since the true ACF is unknown, we utilize a sample version of the ACF \[ \hat{\rho}_x(h) = \frac{\hat{\gamma}_x(h)}{\hat{\gamma}_x(0)} \] where \[ \hat{\gamma}_x(h) = \frac{1}{n} \sum_{t=1}^{n-h} (x_{t+h}- \overline{X})(x_t - \overline{X}) \] with \(\hat{\gamma}(h) = \hat{\gamma}(-h)\) for \(h=0,1, \dots, n-1\).

acf(soi) # autocorrelation function

At lag 0, the ACF is always 1. It appears cyclical, exhibiting seasonal fluctuations.

Note: this model is bad, as it doesn’t account for many variables. Later we’ll develop a better one.

April 11, 2016
\[ \hat{\gamma}(h) = \frac{1}{n}\sum^{n-h}_{t-1}(x_{t+h}-\overline{x})(x_t-\overline{x}) \] The sum runs over a restricted range since \(t+h\) cannot exceed \(n\).

Dividing by \(n\) (or \(n-h\)) will not result in an unbiased estimator.

Under the assumption that the process \(x_t\) is white noise, the approximate standard error of the sample ACF is \(\frac{1}{\sqrt{n}}\). \(\hat{\sigma}_{\hat{\rho}} \approx \frac{1}{\sqrt{n}}\).

If we further assume normality, this will yield a way to get an interval estimate of \(\rho_x(h)\) at any lag.

Example ACF of recruitment data.

acf(rec)

Could this be a white noise series? Blue lines represent \(1/\sqrt{n}\). If this were a white noise series, the ACF would fall within the blue lines

Recall, for white noise \[ \rho(h) = \begin{cases} \begin{array}[lcl] \\ 1, && h=0 \\ 0, && \text{elsewhere} \end{array} \end{cases} \] If there are significant lags (other than zero), this suggests the series is not white noise, and that there is more information to get from the series.

What the ACF reveals for white noise:

w <- rnorm(5000)
acf(w)

This is what we would want if we were to plot the residuals of a time-series model.

Large-Sample Distribution of the ACF
For large \(n\) and under mild conditions, the sample ACF is approximately normal with mean zero and standard deviation \(n^{-1/2}\).

So approximately 95% of the sample ACFs should be within \(\pm 2\frac{1}{\sqrt{n}}\) of zero. This can be used to assess the whiteness of a series.

Example (Moving Average) Consider the previous moving average model \[ v_t = \frac{1}{3} (w_{t-1} + w_t + w_{t+1}) \] and two realizations of it based on different sample sizes.

set.seed(4866)
w1 <- rnorm(12)
w2 <- rnorm(102)
v1 <- filter(x=w1, sides=2, rep(1/3, 3))[-c(1, 12)] # eliminate first and last observations
v2 <- filter(x=w2, sides=2, rep(1/3, 3))[-c(1, 102)]
plot.ts(v1)

plot.ts(v2)

acf(v1, lag.max=4, plot=FALSE)
## 
## Autocorrelations of series 'v1', by lag
## 
##      0      1      2      3      4 
##  1.000  0.611  0.386 -0.114 -0.262
acf(v2, lag.max=4, plot=FALSE)
## 
## Autocorrelations of series 'v2', by lag
## 
##      0      1      2      3      4 
##  1.000  0.615  0.229 -0.155 -0.167

Recall, the actual values are \(1, 2/3, 1/3, 0 , 0, \dots\) (see earlier example). Compare that to our observations above.

par(mfrow = c(2,1))
acf(v1, lag.max=4, plot=TRUE)
acf(v2, lag.max=4, plot=TRUE)

We may think that v1 is white noise, but our small sample size is the reason that the ACFs fall within the blue lines.

Problem Let \(x_t=0.9w_t - 0.1w_{t-1}\) where \(w_t\) is the usual white noise process with \(\sigma_w^2=1\). Find the true ACF, generate a random realization (\(n=500\)), and find the sample ACF.
Solution Theoretical ACF \[ \gamma(h) = \begin{cases} \begin{array}[lcl] \\ (0.81 + 0.01)\sigma_w^2, && h=0 \\ -0.09 \sigma^2_w, && |h|=1 \\ 0, && |h|>1 \end{array} \end{cases} \]

\[ \rho(h) = \begin{cases} \begin{array}[lcl] \\ 1, && h=0 \\ -0.1098, && |h|=1 \\ 0, && |h|>1 \end{array} \end{cases} \]

set.seed(123)
w <- rnorm(500)
x <- filter(w, sides=1, filter=c(0.9, -0.1))[-1] # eliminate first observation
plot.ts(x)

acf(x)

Examine actual values

acf(x, plot=FALSE)
## 
## Autocorrelations of series 'x', by lag
## 
##      0      1      2      3      4      5      6      7      8      9 
##  1.000 -0.156 -0.048  0.015 -0.054 -0.002 -0.046  0.009  0.030  0.025 
##     10     11     12     13     14     15     16     17     18     19 
##  0.005  0.016 -0.106  0.004  0.023  0.082 -0.060  0.010 -0.053  0.015 
##     20     21     22     23     24     25     26 
## -0.005 -0.045  0.080  0.006 -0.008  0.012  0.000

We see for lag 1 (=-0.156), where calculated \(\rho(h)=-0.1098\).

Recall, the cross-correlation function, defined as \[ \rho_{xy}= \frac{\gamma_{xy}(h)}{\sqrt{\gamma_x(0)\gamma_y(0)}} \] is a measure of the correlation between two series separated in time by lag \(h\).

Since this is usually unknown, we can use the sample cross-correlation function \[ \hat{\rho}_{xy}(h) = \frac{\hat{\rho}_{xy}(h)}{\sqrt{\hat{\gamma}_x(0)\hat{\gamma}_y(0)}} \] where \(\hat{\gamma}_{xy} = \frac{1}{n} \sum^{n-h}_{t=1} (x_{t+h} - \overline{x})(y_{t+h} - \overline{y})\).

Under the hypothesis that there is no relation at time \(h\) and that at least one of the two series is independent and identically distributed, the distribution of \(\hat{\rho}_{xy}(0)\) is approximately normal with mean zero and standard deviation \(\frac{1}{\sqrt{n}}\).

Partial Autocorrelation Function (PACF)

One can think of the PACF as the simple correlation between two points separated by a lag \(h\), say \(x_t\) and \(x_{t-h}\), with the effect of the intervening points \(x_{t-1}, x_{t-2} \dots, x_{t-h+1}\) conditional out.

This is analogous to finding the partial correlation in regression between the response variable and one of the independent variables conditioning out the other independent variables (e.g. \(r_{3|21}\))

Suppose we want to predict \(x_t\) \(x_{t-1}, \dots, x_{t-h}\) using some linear function. Consider minimizing the mean square prediction error \[ MSE = E[(x_t - \hat{x}_t)^2] \] using the predictor \[ \hat{x}_t = a_1 x_{t-1} + a_2 x{t-2} + \cdots + a_h x_{t-h} \] assuming, for convenience that \(x_t\) has been adjusted to have mean zero.

Then the PACF is defined as the value of the last coefficient \(a_h\), i.e. \(\Phi_{hh}=a_h\).

The coefficients are between -1 and 1.

The standard error of the estimated PACF is \(\frac{1}{\sqrt{n}}\).

Note: the order of the autoregressive model will be exactly the lag \(h\) beyond which \(\Phi_{hh}=0\).

Example (SOI cont’d) Note that the PACF of the SOI has a single peak at lag \(h=1\) and then relatively small values.

This means, in effect, that fairly good predictions can be achieved by using the immediately preceeding point and that adding further values in the past does not really improve this predictability.

par(mfrow=c(2, 1))
acf(soi)
pacf(soi)

The data has been readjusted as scale because the data is monthly. Lag 1.0 is one season (12 months), which why we see 12 lags by 1.0.

Example (Recruitment series)

acf(rec)

pacf(rec)

par(mfrow=c(1,1))

We see two spikes. The previous month, and the previous 2nd month, includes information that could help predict future data.

(Aside: the PACF of white noise would not have any spikes - nearly all would fall within the blue lines)

Note: The ACF of a nonstationary time series decays very slowly as a function of lag. The PACF of a nonstationary time series tends to have a peak very near unity at lag 1, with other values less than the significance level.

Most series are not stationary to begin with. In such cases, we must modify the series to improve the approximation to stationarity by detrending, differencing, transformations, and linear filtering.

April 13, 2016

Time Series Regression

Here, we relate the dependent variable \(y_t\), to functions of time.

Most useful when the coefficients of trend and seasonality are not functions of time.

Trend Model
The simplest model where there is only a trend component to the series. The general polynomial model is \[ y_t = \beta_0 + \beta_1 t + \beta_2t^2 + \cdots + \beta_p t^p \] We still require the usual assumptions on \(w_t\), where \(w_t \stackrel{iid}{\sim} N(0, \sigma^2_w)\).

We can use R to get an estimated model and prediction intervals.

Example (gtemp) Polynomial Fit. The poly() function generates the polynomial of the degree specified and all lower levels.

ts.plot(gtemp) # reminder of what plot looks like

fit <- lm(formula=gtemp ~ poly(x=time(x=gtemp), degree=3, raw=TRUE), na.action=NULL ) #raw demeans predictor variable
plot(resid(fit))

acf(resid(fit))  # exceed boundary lines indicates correlation - most likely that more than white noise is included in residuals

Using some information criteria, we can try and find on appropriate model.

AIC(fit) # unscaled AIC
## [1] -207.1945
AIC(fit)/length(gtemp) - log(2*pi) # true AIC but not necessary to use b/c comparisons are relative
## [1] -3.431681
AIC(fit, k=log(length(gtemp)))/length(gtemp) - log(2*pi) # BIC
## [1] -3.321391
(AICc <- log(sum(resid(fit)^2)/length(gtemp)) + (length(gtemp)+4)/length(gtemp) - 4-2)  # AICc, may have mistyped
## [1] -9.477835

Seasonal Variation
We can look at a couple of variations, constant or increasing.

Dummy Variables
- Used in regression to model categorical variables.
- Recall, if you have \(L\) factor levels, then we need \(L-1\) dummy variables.

The full model becomes \[ y_t = \beta_0 + \beta_1 t + \cdots + \beta_pt^p +\beta_{s1}x_{s1,t} + \beta_{s2}x_{s2,t} + \cdots + \beta_{s(L-1)}x_{s(L-1),t} + w_t \] The \(x\) valuse are 0 or 1, which tells us which seasonal component we are in at time \(t\), e.g. July.

Example (SOI)

month <- rep(1:12, 38)[-c(454:456)] # don't have data for last 3 months (1987)
dummies <- table(1:length(month), as.factor(month))
summary(lm(soi ~ time(soi) + dummies[,1:11])) # December is baseline
## 
## Call:
## lm(formula = soi ~ time(soi) + dummies[, 1:11])
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.86157 -0.20401 -0.01132  0.22051  0.84104 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       13.490139   2.492629   5.412 1.03e-07 ***
## time(soi)         -0.006693   0.001266  -5.288 1.96e-07 ***
## dummies[, 1:11]1   0.041992   0.067797   0.619    0.536    
## dummies[, 1:11]2   0.094418   0.067796   1.393    0.164    
## dummies[, 1:11]3  -0.053419   0.067795  -0.788    0.431    
## dummies[, 1:11]4  -0.413335   0.067795  -6.097 2.37e-09 ***
## dummies[, 1:11]5  -0.593830   0.067795  -8.759  < 2e-16 ***
## dummies[, 1:11]6  -0.474536   0.067795  -7.000 9.64e-12 ***
## dummies[, 1:11]7  -0.534978   0.067795  -7.891 2.40e-14 ***
## dummies[, 1:11]8  -0.405552   0.067795  -5.982 4.57e-09 ***
## dummies[, 1:11]9  -0.315547   0.067795  -4.654 4.31e-06 ***
## dummies[, 1:11]10 -0.105386   0.068245  -1.544    0.123    
## dummies[, 1:11]11 -0.019179   0.068245  -0.281    0.779    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2935 on 440 degrees of freedom
## Multiple R-squared:  0.4275, Adjusted R-squared:  0.4118 
## F-statistic: 27.38 on 12 and 440 DF,  p-value: < 2.2e-16

p-values for Jan, Feb, Mar, Oct, Nov are insignificant from Dec. Not surprising since they are all winter months.

ts.plot(soi)

library(MASS)

fit.full <- lm(soi ~time(soi) + dummies[, 1:11])

stepAIC(fit.full, direction='both')
## Start:  AIC=-1097.73
## soi ~ time(soi) + dummies[, 1:11]
## 
##                   Df Sum of Sq    RSS      AIC
## <none>                         37.911 -1097.73
## - time(soi)        1    2.4091 40.320 -1071.82
## - dummies[, 1:11] 11   25.7286 63.640  -885.08
## 
## Call:
## lm(formula = soi ~ time(soi) + dummies[, 1:11])
## 
## Coefficients:
##       (Intercept)          time(soi)   dummies[, 1:11]1  
##         13.490139          -0.006693           0.041992  
##  dummies[, 1:11]2   dummies[, 1:11]3   dummies[, 1:11]4  
##          0.094418          -0.053419          -0.413335  
##  dummies[, 1:11]5   dummies[, 1:11]6   dummies[, 1:11]7  
##         -0.593830          -0.474536          -0.534978  
##  dummies[, 1:11]8   dummies[, 1:11]9  dummies[, 1:11]10  
##         -0.405552          -0.315547          -0.105386  
## dummies[, 1:11]11  
##         -0.019179

Output is unexpected. Professor can’t explain what’s wrong.

Creating model by selectively choosing months:

fit2 <- lm(soi ~ time(soi) + dummies[,c(2,4,5,6,7,8,9,10)])
summary(fit2)
## 
## Call:
## lm(formula = soi ~ time(soi) + dummies[, c(2, 4, 5, 6, 7, 8, 
##     9, 10)])
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.86147 -0.20390 -0.01137  0.22124  0.84107 
## 
## Coefficients:
##                                          Estimate Std. Error t value
## (Intercept)                             13.494042   2.489396   5.421
## time(soi)                               -0.006699   0.001264  -5.298
## dummies[, c(2, 4, 5, 6, 7, 8, 9, 10)]2   0.102043   0.053254   1.916
## dummies[, c(2, 4, 5, 6, 7, 8, 9, 10)]4  -0.405709   0.053254  -7.618
## dummies[, c(2, 4, 5, 6, 7, 8, 9, 10)]5  -0.586204   0.053254 -11.008
## dummies[, c(2, 4, 5, 6, 7, 8, 9, 10)]6  -0.466909   0.053255  -8.767
## dummies[, c(2, 4, 5, 6, 7, 8, 9, 10)]7  -0.527350   0.053255  -9.902
## dummies[, c(2, 4, 5, 6, 7, 8, 9, 10)]8  -0.397924   0.053256  -7.472
## dummies[, c(2, 4, 5, 6, 7, 8, 9, 10)]9  -0.307918   0.053257  -5.782
## dummies[, c(2, 4, 5, 6, 7, 8, 9, 10)]10 -0.097760   0.053825  -1.816
##                                         Pr(>|t|)    
## (Intercept)                             9.78e-08 ***
## time(soi)                               1.85e-07 ***
## dummies[, c(2, 4, 5, 6, 7, 8, 9, 10)]2     0.056 .  
## dummies[, c(2, 4, 5, 6, 7, 8, 9, 10)]4  1.57e-13 ***
## dummies[, c(2, 4, 5, 6, 7, 8, 9, 10)]5   < 2e-16 ***
## dummies[, c(2, 4, 5, 6, 7, 8, 9, 10)]6   < 2e-16 ***
## dummies[, c(2, 4, 5, 6, 7, 8, 9, 10)]7   < 2e-16 ***
## dummies[, c(2, 4, 5, 6, 7, 8, 9, 10)]8  4.25e-13 ***
## dummies[, c(2, 4, 5, 6, 7, 8, 9, 10)]9  1.40e-08 ***
## dummies[, c(2, 4, 5, 6, 7, 8, 9, 10)]10    0.070 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2932 on 443 degrees of freedom
## Multiple R-squared:  0.4247, Adjusted R-squared:  0.413 
## F-statistic: 36.34 on 9 and 443 DF,  p-value: < 2.2e-16
par(mfrow=c(2,1))
plot(time(soi), fitted(fit2), col='red', type='l') # fitted model
#lines(soi) # actual observations, too confusing when superimposed
plot(soi) # actual observations, separate plot

par(mfrow=c(1,1))
acf(resid(fit2))

ACF of residuals is clearly not only white noise which means we should be able to find a better model.

Example Consider the bike data which represents four years of quarterly mountain bike sales. Fit a time series regression model and forecast out one year.

quarter <- rep(1:4, 4)
bike <- ts(c(10,31,43,16,11,33,45,17,13,34,48,19,15,37,51,21), frequency=4, start=c(2007, 1)) #starts 1st qtr 2007
ts.plot(bike)

There is a seasonal component. Many more bikes are sold in the summer than the winter.

(Data is obviously fake; real data would never exhibit such a clear pattern.)

fit.bike <- lm(bike ~ time(bike) + factor(quarter))
summary(fit.bike)
## 
## Call:
## lm(formula = bike ~ time(bike) + factor(quarter))
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -0.75  -0.25  -0.25   0.25   1.25 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -4004.7500   302.7930 -13.226 4.25e-08 ***
## time(bike)           2.0000     0.1508  13.266 4.12e-08 ***
## factor(quarter)2    21.0000     0.4782  43.913 1.04e-13 ***
## factor(quarter)3    33.5000     0.4827  69.408 6.89e-16 ***
## factor(quarter)4     4.5000     0.4900   9.184 1.72e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6742 on 11 degrees of freedom
## Multiple R-squared:  0.9983, Adjusted R-squared:  0.9977 
## F-statistic:  1645 on 4 and 11 DF,  p-value: 3.439e-15

The p-values are all highly significant. There’s no need for model selection.

acf(resid(fit.bike))

Finally, a good model. The ACF of residuals is white noise.

Projecting into the future.

pred <- c(fitted(fit.bike), -4004.75+2*c(2011, 2011.25, 2011.5, 2011.75) + c(0,21,33.5,4.5)) # fitted values plus 2011 projection
plot(x=c(time(bike), 2011, 2011.25, 2011.5, 2011.75), y=pred, col='red', type='l', xlab='') 
lines(bike, type='o') # observed values